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This paper is concerned with a statistical estimation procedure in which measurements of a quan- 
tity are taken until two identical readings are obtained; this duplicated value is then taken as the 
estimate of the magnitude of the quantity concerned. The properties of this estimation procedure 
have been investigated numerically, under the assumptions that the individual observations are rounded 
values of measurements which have a normal distribution, and this estimator is compared with the 
arithmetic mean of two observations. It is shown that an arithmetic mean of two observations from the 
rounded distribution is almost always superior to the estimator described above. The exception is 
where the rounding interval is so wide and the rounding lattice is so advantageously placed that the 
only real reason for taking repeat measurements would be as a protection against gross errors. 



The 1961 Book of AST M Standards [3, p. 1131] gives 
tentative methods for calibrating upright tanks. 
These include a proposed method for obtaining cir- 
cumferential measurements on certain types of tanks 
which consists, briefly, of wrapping the measuring 
tape around the tank (at some specified position) with 
tension applied to the tape, taking a reading to the 
nearest 0.005 foot, releasing then reapplying the ten- 
sion, taking another reading, etc., until two identical 
readings are obtained. The value of the equal read- 
ings is then recorded as the circumferential 
measurement. 

The method of estimation used above is familiar to 
all of us, for it is the method used whenever we count 
a (finite) number of things. We count them twice and 
if the two counts agree then we go no further; if they 
do not agree, then the items are counted again, etc., 
until two answers agree. The difference between the 
situations is that in the case of the circumferential 
measurements we can postulate a continuous distribu- 
tion underlying the measurement process, so that a 
recorded observation is a rounded value of a continu- 
ous variable, whereas in the case of counting, the 
distribution of counts is discrete, with incorrect counts 
corresponding to actual mistakes. 

We shall consider here the continuous case; more 
specifically, we shall look into the statistical proper- 
ties of such an estimation procedure when the under- 
lying distribution is normal. It will be shown that an 
arithmetic mean of two observations from the rounded 
distribution is almost always superior to the estimator 
described above for the range of cases considered. 
The exception occurs only when the rounding interval 
is so wide and the rounding lattice is so advantageously 
placed that taking repeat measurements serves simply 
as a protection against gross errors. 

Since the placement of the rounding lattice relative 
to the true value is usually unknown, the occurrence of 
the exceptional case, when it occurs, is not known. If 
the rounding interval is wide and the rounding lattice 



happens to be ^advantageously placed, estimation 
by duplication is very much inferior to the arithmetic 
mean of two observations. 

Effect of Grouping 

The distribution of a single measurement depends 
on the width of the rounding interval, and also on the 
placement of the rounding lattice with respect to the 
true value. We shall assume that measurements are 
obtained as rounded values of the (continuous) random 
variable X 9 which is normally distributed about the 
true value /x of the property under consideration, with 
standard deviation o\ The variable X is rounded to 
the nearest value X R in a rounding lattice where the 
interval, centered on X R , is of length 2Q& and the 
rounding lattice is placed so that the lower boundary 
of the interval containing /x is at /jl + Da {—2Q<D^0). 
For simplicity, and with no loss of generality, we let 
/x = and cr=l. Then the distribution of X R is 
given by 

Prob {X R = x Ri ) = Prob {x Ri -Q^X^x Ri + Q}, 



xri+Q 



1 



jV2^ 



2 l*dz, 



where xm = D + (2i + l)Q, * = 0,±1,±2, .... The dis- 
tribution of X R will be completely specified when Q and 
D are given. We note that the distribution of X R is 
discrete, that the mean of X R is not necessarily zero 
(it is dependent on the position of the rounding lattice), 
and that the variance of X R is always greater than the 
variance of X. (See Eisenhart, Hastay, and Wallis 
[l], 1 ch. 4). Only a finite number of values of X R will 
have probability realistically different from zero 
(although, theoretically, there would be an infinite 
number of them). Strictly speaking, the analysis in 
this paper treats the normal distribution truncated to 



1 Figures in brackets indicate the literature references at the end of this paper. 
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FIGURE 1. Distributions of Xr/ot selected values of 2Q and D. 

the interval (fi — 5(7, ^ + 5o"). In fact, the numerical 
results are believed to be correct for the general case. 
In studying the properties of this method of estima- 
tion by duplication, seven interval lengths were con- 
sidered: 2() = 3cr, 2o% 1.5a-, cr, 0.75cr, 0.5cr, and 0.25cr; 
and 5 positions of the rounding lattice: Z) = 0, — 0.25Q, 
-0.5Q, -0.75(?, and -Q. This range of D is suffi- 
cient for this study since the distributions of Xr for 
— 2Q < D < — Q are mirror images of the distributions 
for — Q < D < 0. Figure 1 illustrates the nature of 
the distributions from which observations will be taken. 
Shown are the "best" (i.e., most advantageous) case, 
where D = — Q, the "worst" case, where D = 0, and an 
intermediate case, D = — 0.5(), for 2Q = 3.0 and 
2Q=l.O. Table 1 shows the mean and variance of 
the distribution of Xr and the number, m, of rounding 
intervals (i.e., the number of values of Xr) necessary to 
cover the range — 5cr to 5o~ of the normal curve for all 
cases considered here. 

Distribution of T 

Having specified the distribution of a single measure- 
ment, we may now turn to the estimator under con- 
sideration which will be denoted by T. T is the 
common value of the first two identical measurements 
in a sequence of measurements. Obviously the sample 
size N required to obtain identical measurements, as 
well as their common value T, is a random variable. TV 
can take on the values 2, 3, ... . For the purpose of 
computation, the number of intervals has been limited 
to m, so that the range of /V becomes 2, 3, . . . , m ■+- 1. 
Let 

Pi = Prob {X R = x Ri } =~j= e-"l*<k. 

Then, the joint probability that T = xr[= z U and N=n is 
given by 

P{t i ,n}=(n-l)W^. . .Jftp*. . . .pi, (1) 



Table I. — Characteristics of the distribution o/X R : mean, variance, 
and number of intervals necessary to cover the range — 5a to 5or 
of the normal curve 



\v. D 
2Q \^ 





-0.25() 


-0.5^ 


-0.75<2 


-Q 


3.00 


0* 

2.2986 

4 


0.0754 

2.1318 

4 


0.1065 
1.7380 

4 


0.0753 

1.3569 

5 




1.2027 

5 


2.00 




1.3650 

6 


0.0032 

1.3557 

6 


0.0046 

1.3333 

6 


0.0032 

1.3109 

6 




1.3016 

5 


1.50 




1.1882 
8 


0.0001 

1.1880 

8 


0.0001 

1.1875 

8 


0.0001 
1.1870 

7 




1.1868 

7 


1.00 




1.0833 

10 


0.0000 

1.0833 

11 


0.0000 

1.0833 

11 


0.0000 

1.0833 

11 




1.0833 

11 


0.75 




1.0469 

14 


0.0000 

1.0469 

14 


0.0000 
1.0469 

14 


0.0000 

1 .0469 

15 




1.0469 

15 


0.50 




1.0208 

20 


0.0000 

1.0208 

21 


0.0000 

1.0208 

21 


0.0000 

1 .0208 

21 




1.0208 

21 


0.25 



1.0052 

40 


0.0000 

1 .0052 

41 


0.0000 

1.0052 

41 


0.0000 

1.0052 

41 




1.0052 

41 



*In each cell of this table, the upper entry is the mean, the middle entry is the variance, 
and the lower entry is the number of intervals; and "0" means zero exactly, while 0.0000 
indicates that the value is zero to at least 4 decimal places. 

where the summation is over all (n — 2)-fold products 
such that j < k < . . . < / and j\ k, . . . , / # i. For 
n = 2 and 3 (1) simplifies to 



and 



P{ti,2}=pr 



P{t i ,3}=2p i *(l-p i ) 



(la) 



By summing the quantities (1) and (la) over all 
values of A^ we obtain the distribution of T independent 
of the value of TV, from which we may deduce some of 
the properties of this measurement procedure. 
Figures 2A and 3A show the distribution of T for 
selected values of 2Q and D. The means and variances 

ofr, 

m m + 1 

E(T)=2i tiP{t u n} 

i = l n=2 

and 

m m+1 

Var(D = £ tti*P{ti,n}-[E(T)Y, 

* = 1 n=2 

for all combinations of Q and D considered are given in 
table 2. (These and other tabled values are believed 
to be correct to the accuracy given, taking into account 
errors due to truncation of the normal distribution and 
due to rounding during calculation.) Note that the 
E(T), which are biases of T as an estimator of \x are 
considerably larger than the E(X R ) (c.f., table 1) for 
large intervals except in the symmetric cases 
(D = and— Q) where E(T) and E{X R ) are identically 
zero. 

Expected values of T conditional on N=n, E(T\ri), 
and variances of T conditional on the value of N, 
Var (T\n), were also calculated, but they are not re- 
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FlGURE 2. 



Distributions of T and the corresponding distributions of 
Xii'ifor 2Q = 3.0 and selected values ofD. 



Figure 3. Distributions of T and the corresponding distributions of 
Xr2 for 2Q =1.0 and selected values of D. 



ported here. It turned out that, for the non-symmetric 
positions of the rounding lattice, the conditional biases, 
E(T\n\ are generally larger than the corresponding 
expected values of Xr and depend not only on the 
size and placement of the interval but also on the 
sample size, n. But more disconcerting is the fact 
that Var (T\n) increases rather than decreases as n 
increases. Thus one would be better off with an 
estimate obtained with N = 2 than with a larger N. 
But there is no control over the sample size since 
N, too, is a random variable. It can be shown that 
for the maximum value of N, iV=m+l, the (condi- 
tional) distribution of T is identically the distribution 
of X R . Thus an estimate based on the maximum 
sample size is no better than a single observation from 
the rounded distribution. While Prob{N=m+l} 
<<0.0005, this still is not a happy situation. 



Table 2. Values of the mean and variance of the (unconditional) 
distribution of T for selected values of 2Q and D 



\fl 

20 \^ 





-0.25Q 


-O.50 


-0.75<? 


-Q 


3.0 


0* 
2.250 


0.259 
1.850 


0.351 
1.049 


0.238 
0.453 



0.258 


2.0 



1.038 


0.032 
0.982 


0.046 
0.848 


0.032 
0.717 



0.664 


1.5 



0.743 


-0.002 
0.741 


-0.002 
0.736 


-0.002 
0.732 



0.730 


1.0 



0.652 


0.000 
0.652 


0.000 
0.652 


0.000 
0.652 



0.652 


0.75 



0.618 


-0.000 
0.618 


-0.000 
0.618 


-0.000 
0.618 



0.618 


0.5 



0.589 


0.000 
0.589 


0.000 
0.589 


0.000 
0.589 



0.589 


0.25 



0.560 


0.000 
0.560 


0.000 
0.560 


0.000 
0.560 



0.560 



By summing the quantities (1) and (la) over all 
values of i we obtain the distribution of N. The 
probabilities, 

m 

Prob{/V = /i} = £/>{/,•,/!.}, 



and the means and variances of /V, 



*In each cell of this table, the upper entry is the mean and the lower entry is the variance; 
and "0" means zero exactly, while 0.000 indicates that the value is zero to at least 3 decimal 
places. 



E(N) = ^nProb{N = n} 



Var (IV) = £ n 2 Prob {N=n}- [E(N)Y 



are given in table 3 for the various distributions of Xr. 
Probabilities less than 0.0005 are not reported. We 
see that, for the range of intervals considered here, one 
would ordinarily expect to take from 3 to 6 observations 
to obtain an estimate by this procedure. Only when 
2() = 3.0 and the rounding lattice is advantageously 
placed would one expect to obtain , an estimate at 
N=2 —but in these cases the only real reason for 
taking repeat observations would be as a protection 
against gross errors. 



Comparison of T With X R2 

Since the arithmetic mean is the best estimator of 
the mean of a normal distribution, it is reasonable to 
compare T with this fixed sample size estimator. The 
arithmetic mejm of two observations from the distribu- 
tion of Xr, Xr2, is chosen for comparison with T 
because, as will be seen below it is almost always a 
better estimator of /jl than is T for the range of Q and D 
considered here. 
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Table 3. Values o/Prob {N = n}, and of the mean and variance of 
the distribution of 'N, for selected values of2Q andD 




2 

3 

4 

5 

6 

7 

8 

9 

10 

EiN) 

Var (TV).. 



-0.25<2 



-0.50 -0.75(2 







2<? = 


3.0 






2 


0.497 

.499 

.004 

2.507 

0.258 


0.537 

.457 

.006 

2.470 

0.262 


0.631 

.356 

.013 

2.382 

0.262 


0.722 

.258 

.020 

2.298 

0.249 


0.760 


3 


.217 


4 


.023 


E(N) 


2.264 


Var(iV) 


0.241 











2(7=2.0 






2 

3 


0.457 
.478 
.062 
.003 
2.611 
0.379 


0.465 
.464 
.068 
.003 
2.608 
0.391 


0.486 

.429 

.083 

.002 

2.601 

0.418 


0.507 

.394 

.098 

.001 

2.594 

0.445 


0.516 
.379 


4 


.104 


5 


.001 


E(N) 


2.591 


Var(/V) 


0.457 











2(?=1.5 






2 


0.384 
.441 
.153 
.021 
2.812 
0.588 


0.385 
.438 
.157 
.020 
2.813 
0.590 


0.388 
.429 

.165 

.018 

2.815 

0.594 


0.390 

.420 

.174 

.016 

2.817 

0.599 


0.391 


3 


.416 


4 


.177 


5 


.015 


E(N) 


2.818 


Var(A0 


0.600 







2Q=1.0 



2 


0.271 
.373 
.252 
.090 
.014 
.001 
3.206 
0.980 


0.271 
.373 
.252 
.089 
.014 
.001 
3.206 
0.979 


0.271 
.372 
.253 
.089 
.014 
.001 
3.206 
0.979 


0.271 
.372 
.253 
.088 
.015 
.001 
3.206 
0.979 


0.271 


3 


.372 


4 


.254 


5.v 


.088 


6.. 


.015 


7 


.001 


E(N) 


3.206 


Var(A0 


0.979 











2(? = 0.75 






2 


0.207 
.315 
.271 
.147 
.050 
.010 
.001 
3.551 
1.389 


0.207 
.315 
.271 
.147 
.050 
.010 
.001 
3.551 
1.389 


0.207 
.315 
.271 
.147 
.050 
.010 
.001 
3.551 
1.389 


0.207 
.315 
.271 
.147 
.050 
.010 
.001 
3.551 
1.389 


0.207 


3 


.315 


4 


.271 


5 


.147 


6 


.050 


7 


.010 


8 


.001 


E{N) 


3.551 


Var (TV) 


1.389 







2^ = 0.5 



0.140 
.234 
.248 
.193 
.114 
.050 
.016 
.004 
.001 
4.145 
2.248 



0.140 
.234 
.248 
.193 
.114 
.050 
.016 
.004 
.001 
4.145 
2.248 



0.140 
.234 
.248 
.193 
.114 
.050 
.016 
.004 
.001 
4.145 
2.248 



0.140 
.234 
.248 
.193 
.114 
.050 
.016 
.004 
.001 
4.145 
2.248 



0.140 
.234 
.248 
.193 
.114 
.050 
.016 
.004 
.001 
4.145 
2.248 







2(2 = 0.25 






2 


0.070 
.129 
.165 
.172 
.155 
.122 
.085 
.052 
.028 
.013 
.006 
.002 
.001 
5.511 
4.957 


0.070 
.129 
.165 
.172 
.155 
.122 
.085 
.052 
.028 
.013 
.006 
.002 
.001 
5.511 
4.957 


0.070 
.129 
.165 
.172 
.155 
.122 
.085 
.052 
.028 
.013 
.006 
.002 
.001 
5.511 
4.957 


0.070 
.129 
.165 
.172 
.155 
.122 
.085 
.052 
.028 
.013 
.006 
.002 
.001 
5.511 
4.957 


0.070 


3 


.129 


4 


.165 


5 


.172 


6 


.155 


7 


.122 


8 


.085 


9 


.052 


10 


.028 


11 


.013 


12 


.006 


13 


.002 


14 


.001 


E(N) 


5.511 


Var (TV) 


4.957 







Note that the mean of Xr 2 is the same as the mean of 
X R and that Var (X R2 ) = Var_(X«)/2. Figures 2B and 3B 
show the distributions of Xr 2 for 2Q = 3.0 and 1.0 and 
J9 = 0, — 0.5Q, and —Q. We see that the spacing 
between possible values of the estimates has decreased 
to half of the width of the original rounding interval so 
that if the closest value is not obtained, the size of the 
miss need not be as large as for T and the probability of 
a large miss is smaller. 

For comparing T with Xr 2 , relative efficiency will 
be used: 

™ . c T i t - t y Var(^/2+[£(X fi )] 2 

hthciency ol 1 relative to X R2 = —r- — /T . ; rrml2 — 

Var (T)+[E(T)] 2 

(which is usually expressed as a percentage). The 
relative efficiencies are given in table 4. We see 
that only in two instances does T show superior 
behavior over Xr 2 — where 2(?= z 3.0, D = —0.75Q y and 
— Q — as indicated by relative efficiencies of 134.3 
percent and 232.9 percent respectively. The explana- 
tion for these high efficiencies lies in the fact that, 
in such cases, if an estimate is obtained at N=2 — 
which, by reference to table 3, happens more than 70 
percent of the time — that estimate is almost sure to be 
the value in the interval containing ^i = 0. This 
leads to a very small variance for the distribution of 
T conditional on iV=2 which offsets the larger vari- 
ances for N > 2. Actually, this effect is also working 
at the same positions of the rounding lattice for 2^ = 2.0 
and 2Q= 1.5 but not to the same extent. In all these 
cases, the conditional variance of T for N = 2 is 
smaller than the variance of an arithmetic mean of 
2 observations, so that it is possible to obtain better 
results using estimation by duplication, although 
these circumstances are limited and not within the 
control of the observer. For 2Q ^1.0, T is never 
superior to the mean of 2 observations. 

Table 4. Efficiency of T relative to the arithmetic mean of two 
observations for selected values of 2Q and D 



\ D 












2Q \^ 





-0.25() 


-0.5(2 


-0.75(2 


-Q 




% 


% 


% 


% 


% 


3.0 


51.1 


55.9 


75.1 


134.3 


232.9 


2.0 


65.8 


69.0 


78.4 


91.3 


98.1 


1.5 


80.0 


80.2 


80.6 


81.1 


81.3 


1.0 


83.1 


83.1 


83.1 


83.1 


83.1 


0.75 


84.7 


84.7 


84.7 


84.7 


84.7 


.5 


86.7 


86.7 


86.7 


86.7 


86.7 


.25 


89.8 


89.8 


89.8 


89.8 


89.8 



Comparison of T and X R2 When the True 
Value Is Considered To Be a Random 
Variable 

We have seen what results can be expected using 
estimation by duplication on a single object when the 
true value of the quantity being estimated is at zero 
and the rounding lattice is at certain fixed positions 
relative to zero. If we consider the rounding lattice to 
be placed at random — which is equivalent to having the 
rounding lattice centered on zero and assuming that 
the true values of the quantities are uniformly dis- 
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tributed between — Q and Q [1, ch 4] — it is apparent 
that at least for the interval sizes considered here, one 
would be better off using the mean of two observations 
as an estimate of the true value rather than the repeated 
value; for even in the case 2()=^_3.0 one would ex- 
pect an efficiency of T relative to Xr 2 greater than 100 
percent only about one-third of the time. 

Values for quantities similar to those given in tables 1 
through 4 can be calculated under the new assump- 
tions. We note first that T is unbiased under these 
assumptions, due to symmetry, and that X R2 is also 
unbiased for the same reason. Youden, Connor, and 
Severo [2] have calculated the probability that N = 2 
for intervals of length 2^=3.0, 2.0, and 1.0. These 
probabilities are 0.6296, 0.4860, and 0.2709, respec- 
tively. Since estimation by duplication yields results 
superior to the taking of an arithmetic mean only when 
N : =2, as indicated by the conditional variance being 
smaller than the variance of the mean of two observa- 
tions, we may use these probabilities to obtain an 
estimate of the probability of obtaining better results 
using estimation by duplication. The distribution of 
T conditional on N=2 for 2Q = 3.0 and 2.0 has variance 
smaller than Var(X«)/2 for almost all values of D 
between — Q.5Q and — Q. Thus if 2Q = 3.0, the proba- 
bility of obtaining better results with Tis approximately 
(0.6296) (0.5) = 0.3148; and if 2£) = 2.0, the probability 
is approximately (0.4860) (0.5) = 0.2430. Since the 
behavior of T is never better than Xr 2 for 2Q= 1.0, the 
probability of obtaining better results with Tis zero. 

While the other quantities, such as the mean and 
variance of the marginal distribution of T, may be of 
interest, they are difficult to obtain under the new as- 
sumptions to any accuracy and would only point more 
to the fact that estimation by duplication is not a good 
estimation procedure to use when the underlying dis- 



tribution is normal. Rough estimates of some of the 
quantities may be obtained by averaging the appropri- 
ate values in the tables given. 

From the preceding discussion it follows that when 
the true value of the quantity to be measured is con- 
sidered to be uniformly distributed in an interval of 
length 2Qcr and measurements of that quantity are 
normally distributed about its true value with standard 
deviation o", then for 2Q ^ 3.0, the probability is at 
most 0.3148 that estimation by duplication is better 
than the arithmetic mean of two observations. For 
2Q ^1.0 better results can always be obtained with 
the arithmetic mean of only two observations. 

In conclusion it appears that the practice of taking 
readings until two identical readings are obtained can- 
not be justified since the average of the first two read- 
ings almost always yields a better estimate of the 
measured quantity. 

The author thanks Churchill Eisenhart for suggesting 
this investigation and for his guidance, and Joseph M. 
Cameron and Joan R. Rosenblatt for their helpful 
suggestions for writing this paper. 
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